Unification-based Multimodal Integration

ثبت نشده
چکیده

Recent empirical research has shown conclusive advantages of multimodal interaction over speech-only interaction for mapbased tasks. This paper describes a multimodal language processing architecture which supports interfaces allowing simultaneous input from speech and gesture recognition. Integration of spoken and gestural input is driven by unification of typed feature structures representing the semantic contributions of the different modes. This integration method allows the component modalities to mutually compensate for each others' errors. It is implemented in QuickSet, a multimodal (pen/voice) system that enables users to set up and control distributed interactive simulations. 1 I n t r o d u c t i o n By providing a number of channels through which information may pass between user and computer, multimodal interfaces promise to significantly increase the bandwidth and fluidity of the interface between humans and machines. In this work, we are concerned with the addition of multimodal input to the interface. In particular, we focus on interfaces which support simultaneous input from speech and pen, utilizing speech recognition and recognition of gestures and drawings made with a pen on a complex visual display, such as a map. Our focus on multimodal interfaces is motivated, in part, by the trend toward portable computing devices for which complex graphical user interfaces are infeasible. For such devices, speech and gesture will be the primary means of user input. Recent empirical results (Oviatt 1996) demonstrate clear task performance and user preference advantages for multimodal interfaces over speech only interfaces, in particular for spatial tasks such as those involving maps. Specifically, in a within-subject experiment during which the same users performed the same tasks in various conditions using only speech, only pen, or both speech and pen-based input, users' multimodal input to maps resulted in 10% faster task completion time, 23% fewer words, 35% fewer spoken disfluencies, and 36% fewer task errors compared to unimodal spoken input. Of the user errors, 48% involved location errors on the map--errors that were nearly eliminated by the simple ability to use penbased input. Finally, 100% of users indicated a preference for multimodal interaction over speech-only interaction with maps. These results indicate that for map-based tasks, users would both perform better and be more satisfied when using a multimodal interface. As an illustrative example, in the distributed simulation application we describe in this paper, one user task is to add a "phase line" to a map. In the existing unimodal interface for this application (CommandTalk, Moore 1997), this is accomplished with a spoken utterance such as 'CREATE A LINE FROM COORDINATES NINE FOUR THREE NINE THREE ONE TO NINE EIGHT NINE NINE FIVE ZERO AND CALL IT PHASE LINE GREEN'. In contrast the same task can be accomplished by saying 'PHASE LINE GREEN' and simultaneously drawing the gesture in Figure 1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unification-based Multimodal Parsing

In order to realize their full potential, multimodal systems need to support not just input from multiple modes, but also synchronized integration of modes. Johnston et al (1997) model this integration using a unification operation over typed feature structures. This is an effective solution for a broad class of systems, but limits multimodal utterances to combinations of a single spoken phrase...

متن کامل

Understanding Multimodal Interaction by Exploiting Unification and Integration Rules

This paper presents a model for synergistic integration of multimodal speech and pen information. The model consists of an algorithm for matching and integrating interpretations of inputs from different modalities, as well as of a grammar that constrains integration. Integration proper is achieved by unifying feature structures. The integrator is part of a general framework for multimodal infor...

متن کامل

Unification-based Multimodal Integration

Recent empirical research has shown conclusive advantages of multimodal interaction over speech-only interaction for mapbased tasks. This paper describes a multimodal language processing architecture which supports interfaces allowing simultaneous input from speech and gesture recognition. Integration of spoken and gestural input is driven by uni cation of typed feature structures representing ...

متن کامل

Multimodal language processing

Multimodal interfaces enable more natural and effective humancomputer interaction by providing multiple channels through which input or output may pass. In order to realize their full potential, they need to support not just input from multiple modes, but synchronized integration of semantic content from different modes. This paper describes a multimodal language processing architecture which a...

متن کامل

Using HPSG to represent multi-modal grammar in multi-modal dialogue

In order to realize their full potential, multimodal systems need to support not just synchronized integration of multiple input modalities, but also a consistent easy-of-using interface to isolate integration strategies from application ad hoc manner. As the range of multi-modal utterances supported is extended, type of input modalities are increasing, utterances being supported from individua...

متن کامل

UI on the Fly: Generating a Multimodal User Interface

UI on the Fly is a system that dynamically presents coordinated multimodal content through natural language and a small-screen graphical user interface. It adapts to the user’s preferences and situation. Multimodal Functional Unification Grammar (MUG) is a unification-based formalism that uses rules to generate content that is coordinated across several communication modes. Faithful variants ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002